Research Title

First-approach to complaints’ call center in Rome. Task prediction with DHR models. Compared analysis


Research Type

[MoD | method-over-data]


Abstract

Life in Rome is characterized by chaos and frenetic activity. The city spans a vast area with limited underground connections, and it faces challenges such as rural animal invasions, widespread traffic congestion, air pollution, and significant concerns regarding garbage collection, locally known as “monnezza”. This paper examines the organizational structure of efforts to address these issues, including the roles, number, and responsibilities of the figures involved. Data collected from various channels, including reports and complaints submitted to the Citizen’s Digital Home, Call Center, Help Desk, and Email, by both resident and non-resident citizens, tourists, and city users, are analyzed to provide insights influenced by the socio-political structure of Rome and the diverse objectives of the Call Center.

To better understand the city’s issues and potentially support the efforts of city officials, we have tried several approaches to time series forecasting. We started with the more general case, aiming to predict only the total count of complaints. However, from an operational point of view, predictions with respect to municipality or thematic areas could be more relevant. Therefore, we will try multiple approaches to compare the predictions of individual subsets of complaints with the more general case of all complaints, integrating all the information if possible. If better results are obtained by considering a single type of information (municipality or theme only), we will explore further improvements to the model.


Dataset Description

Our first-version dataset contains several key variables related to the management of urban issues in Rome, including:

  • Struttura di prima assegnazione: The initial office or structure to which the report was assigned.
  • Anno: The year the report was submitted (here not relevant as in this first step is only 2022, but we have it since we can extend to 2021-2022 easily, tentativly also to other years).
  • Data di presentazione: The date the report was registered.
  • Tipo di segnalazione - Descrizione: The description of the type of report made.
  • Descrizione Area Tematica: The thematic area to which the report belongs.
  • Descrizione Argomento: The specific topic of the report.
  • Descrizione tipo arrivo: The channel through which the report was received (e.g., Call Center, Email).
  • Punto esclamativo SI/NO: Indicator of urgent reports.
  • Municipio di riferimento: The municipality of Rome to which the report pertains.
  • Mese: The month the report was registered.

Initially, we focused on the submission date to build our time series. We will utilize all available variables as regressors in our analyses in the first step. However, we recognize that further data preprocessing could enhance our predictions.

Identifying specific dynamics at the municipality or thematic area level could be particularly relevant. Additionally, considering the Struttura di prima assegnazione might provide greater insight into task distribution and aid in urban management.

From a robustness perspective, it’s important to note that some thematic areas with a low number of reports, such as those concerning historical assets or legal, commercial, and notarial assistance services, may not be relevant for inclusion in our model. Examples of thematic areas with few reports include:

  • Legal, Commercial, and Notarial Assistance Service
  • Innovation and Smart City Projects
  • Acquisition and Disposal Procedures

A better understanding of task distribution among these structures could help improve the management and effectiveness of responses to reports in the city of Rome.

We add some plots with a variable with respect to both Municipality and Month. This already gives a hint, but trying to be too detailed might not be relevant. Indeed, we believe that knowing the workload for a specific Municipality (location) for a certain month would be of interest. However, further prioritization depending on how Rome is actually organized would be too complex as a first step and for our research aims.

## 
##                      AMA LineaVerde                             Reclamo 
##                               55322                                6822 
## Richiesta informazioni e assistenza                        Segnalazione 
##                                   1                               47207
## [1] NA

We observe that AMA is contacted “seasonally” in January, July, October, and December, particularly during vacation periods and adverse weather conditions (e.g., rain in November). This is likely due to reduced personnel, which leads to increased problems. Conversely, the number of “Segnalazioni” decreases during hotter periods, likely due to the vacation season.

library(knitr)

# Contigency table MUNICIPY - AREA OF COMPLAINTS -----------------------
#table(data_2022$`Municipio di riferimento`, data_2022$`Tipo di segnalazione - Descrizione`)

#######################################################################
#AMA LineaVerde Reclamo Segnalazione
#1            5768     389         7662
#2            3103     314         6038
#3            2964     147         3562
#4            1902     122         2324
#5            4481     322         5911
#6            7353     128         1789
#7            6579     280         5167
#8            3222     109         2172
#9            5424     155         2956
#10           5048     306         2850
#11           2246      93         1543
#12           2339     114         2364
#13           3090     118         1716
#14           5669     132         2459
#15           1823      68         2046
#######################################################################

plot(table(data_2022$`Municipio di riferimento`,
     data_2022$`Tipo di segnalazione - Descrizione`), col=c("orange", "red","magenta"),
     border="white", main="Contingency table COMPLAINTS TYPE \n AND MUNICIPY")

#legend("bottomright","AMA LineaVerde", "Reclamo", "Segnalazione", col=c("red", "purple","magenta"),
#       lwd=1)

# Deepin MUNICIPY variable

If out of curiosity we want to look at the digitization of these aspects… It’s a nightmare! Only 0.006% of reports are made through the app, through email Portal 060606: 0.096% only 6.635% by Email To Case.

We expect that this limits the number of complaints from tourists, for example, who may not know the language and do not want to undertake a phone report with a police officer who barely knows Italian (speaking from personal experience).

Arrival Type Percenage (%)
APP 0.006
Call Center 66.645
CDC 24.107
Email Portale 060606 0.096
Email To Case 6.635
URP 2.504

1. all the relevant chances/adjustments/“revolutions” that occurred since the 1st milestone

Software/Hardware Toolkit

With respect to the first milestone we get the time series from the dataset which we already assembled in a previous steps. The process of pre-processing data are omitted here.

We just starting playing around with dhReg packet and try to understand the result. In our first attempt we set Frequency to c(24,168) and xreg to NULL meaning we’re not providing any regressor to help identifying the model.

Main research aim

In our research, we aim to explore various forecasting models and methodologies to improve the prediction accuracy of time series data, specifically focusing on seasonal patterns and the integration of external regressors. We have started by delving into Dynamic Harmonic Regression (DHR) and have particularly consulted the following articles:

Rausch, T. M., Albrecht, T., & Baier, D. (2022). Beyond the beaten paths of forecasting call center arrivals: On the use of dynamic harmonic regression with predictor variables. Journal of Business Economics, 92, 675--706.

Ta-Hsin Li (2003). A Hierarchical Framework for Modeling and Forecasting Web Server Workload

In addition, we intend to further investigate tree-based methods by referencing:

Vercauteren, T., Aggarwal, P., Wang, X., & Li, T.-H. (2007). Hierarchical Forecasting of Web Server Workload Using Sequential Monte Carlo Training. IEEE Transactions on Signal Processing, 55, 1162-1184.

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C.B. (1984). Classification and Regression Trees. Wadsworth, Belmont.

We plan to utilize classification and regression trees (CART) as a dimensionality reduction technique and to understand how these methods can be applied to improve our forecasting models.

suppressPackageStartupMessages({
library(dplyr)
library(lubridate)
library(tsibble)
# dhReg ----------------------------------
require(dhReg)
library(dhReg)
})

# Multivariate time series -----------------------
suppressMessages({
ts <- data_2022 %>%
  group_by(`Data di Presentazione`, 
           `Tipo di segnalazione - Descrizione`, 
           `Descrizione Area Tematica`) %>%
  summarise(Numero_Segnalazioni = n())
})


# TS --------------------------------------
# Univariate time series of the cumulative numner of complaints for day
ts_univariate <- data_2022 %>%
  group_by(`Data di Presentazione`) %>%
  summarise(Numero_Segnalazioni = n()) %>%
  ungroup()

# First DHR models ---------------------------------
# Fitting the total number of complaints with a simple DHR model
res1<-dhr(ts_univariate$Numero_Segnalazioni,
    XREG = NULL,     # no indipendent variable
    Range = list(1:2, 1),       # Range of terms for Fourier's model
    Frequency = c(24, 168),     # Seasonality (dayly and weekly)
    Criteria = "aicc",          # Model selection criteria
    maxp = 5,                   # Max order of AR in auto.arima model 
    maxq = 5,                   # Max order of MA in auto.arima model
    maxd = 5)                   # Max differentiation (default setting)

res1
## Series: Data 
## Regression with ARIMA(2,1,5) errors 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2      ma3      ma4     ma5    S1-24
##       0.8625  -0.7088  -1.5797  1.1844  -0.2037  -0.5564  0.3084  -0.0659
## s.e.  0.0725   0.0625   0.0817  0.1469   0.1623   0.1318  0.0624   0.0267
##        C1-24   S2-24    C2-24  S1-168  C1-168
##       0.0588  0.0644  -0.0093   0.038  0.3045
## s.e.  0.0267  0.0294   0.0295   0.125  0.1277
## 
## sigma^2 = 0.1234:  log likelihood = -130.38
## AIC=288.76   AICc=289.96   BIC=343.32
# Forecast  DHRregression-------------------------------------
?fc

pred1<-fc(Frequency = c(24, 168) , XREG_test = NULL, h=10,
   Fit=res1, Data=ts_univariate$Numero_Segnalazioni)

plot(pred1)

ARIMA(2,1,5) specifies the autoregressive (AR) order as 2, the differencing (I) order as 1, and the moving average (MA) order as 5.

Box Cox transformation: lambda= 0: The Box Cox transformation is applied to stabilize the variance of the time series. Here, lambda=0 suggests a log transformation.

Coefficients:
  • ar1, ar2, ma1, ma2, ma3, ma4, ma5: These are coefficients for the autoregressive (AR) and moving average (MA) terms of the ARIMA model. They capture the temporal dependencies in the time series data.
  • S1-24, C1-24, S2-24, C2-24, S1-168, C1-168: These are coefficients for the Fourier terms, representing hourly (24) and weekly (168) seasonal patterns.

Model Statistics: - sigma^2 = 0.1234: This is the estimated variance of the error term in the model. - log likelihood = -130.39: Higher log-likelihood values indicate better fit. - AIC (Akaike Information Criterion) = 288.78: Lower AIC values indicate better models. - AICc (corrected AIC) = 289.98: AICc adjusts AIC for small sample sizes. - BIC (Bayesian Information Criterion) = 343.34: Similar to AIC, BIC also balances goodness of fit and model complexity, but it penalizes more complex models more heavily. Lower BIC values indicate better models.

Overall, this model suggests that the time series data has significant autoregressive and moving average dependencies, as well as hourly and weekly seasonal patterns. The statistics indicate that the model provides a reasonable fit to the data, with seasonality being a predominant component. Indeed, all 5 out of 5 seasonal components were needed to minimize the fit. For this reason, trying a SARIMA model will also be planned.

Before moving to different models, we wanted to use the information by passing the other variables in the dataset as regressors through the variable xreg. This step requires further pre-processing of the data. We follow a variation of One Hot Encoding, not with a 0-1 matrix but with the cumulative sums of the complaints with respect to the corresponding column.

# One Hot Encoding -------------------------------
suppressPackageStartupMessages({
library(tidyr)
})

# Expand the matrix

# 1. Aggregate the data to count the reports for each combination of date, report type, and thematic area
aggregated_data <- data_2022 %>%
  group_by(`Data di Presentazione`, `Tipo di segnalazione - Descrizione`, `Descrizione Area Tematica`) %>%
  summarise(Numero_Segnalazioni = n(), .groups = 'drop')

# 2. Use pivot_wider to transform the data into a wide format
wide_data <- aggregated_data %>%
  pivot_wider(names_from = c(`Tipo di segnalazione - Descrizione`, `Descrizione Area Tematica`),
              values_from = Numero_Segnalazioni,
              values_fill = list(Numero_Segnalazioni = 0))

wide_data_NODATE <- wide_data[, 2:43] %>%
  mutate(across(everything(), as.numeric))

res2 <- dhr(ts_univariate$Numero_Segnalazioni,
            XREG = as.matrix(wide_data_NODATE),     
            Range = list(1:2, 1),       # Range of the Fourier terms
            Frequency = c(24, 168),     # Seasonal frequencies (hourly and weekly)
            Criteria = "aicc",          # Model selection criterion
            maxp = 5,                   # Maximum AR order in auto.arima
            maxq = 5,                   # Maximum MA order in auto.arima
            maxd = 5)                   # Maximum differencing order in auto.arima

print(res2)
## Series: Data 
## Regression with ARIMA(2,0,0) errors 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##          ar1      ar2  intercept    S1-24   C1-24   S1-168   C1-168
##       0.4653  -0.0394     4.4409  -0.0196  0.0041  -0.0060  -0.0029
## s.e.  0.0635   0.0617     0.0340   0.0178  0.0179   0.0213   0.0205
##       AMA LineaVerde_GESTIONE RIFIUTI E PULIZIA URBANA
##                                                  3e-03
## s.e.                                             1e-04
##       Reclamo_OPERE E MANUTENZIONE DELLA CITTA  Reclamo_SERVIZI MUNICIPALI
##                                         0.0122                     -0.0006
## s.e.                                    0.0042                      0.0031
##       Reclamo_SICUREZZA URBANA  Segnalazione_ANAGRAFE E SERVIZI CIVICI
##                         0.0065                                  0.0117
## s.e.                    0.0063                                  0.0069
##       Segnalazione_MOBILITA E TRASPORTI
##                                  0.0021
## s.e.                             0.0029
##       Segnalazione_OPERE E MANUTENZIONE DELLA CITTA
##                                              0.0063
## s.e.                                         0.0007
##       Segnalazione_SERVIZI MUNICIPALI  Segnalazione_SICUREZZA URBANA
##                                0.0047                         0.0049
## s.e.                           0.0018                         0.0007
##       Segnalazione_TURISMO  Reclamo_GESTIONE RIFIUTI E PULIZIA URBANA
##                    -0.0168                                    -0.0057
## s.e.                0.0243                                     0.0087
##       Reclamo_MOBILITA E TRASPORTI  Reclamo_TRIBUTI E CONTRAVVENZIONI
##                            -0.0013                             0.0097
## s.e.                        0.0038                             0.0068
##       Segnalazione_AMBIENTE  Segnalazione_GESTIONE RIFIUTI E PULIZIA URBANA
##                      0.0034                                          0.0090
## s.e.                 0.0008                                          0.0061
##       Segnalazione_INNOVAZIONE E SMART CITY  Segnalazione_PATRIMONIO
##                                     -0.0079                  -0.0075
## s.e.                                 0.0092                   0.0132
##       Segnalazione_TRIBUTI E CONTRAVVENZIONI  Reclamo_ANAGRAFE E SERVIZI CIVICI
##                                       0.0110                             0.0061
## s.e.                                  0.0042                             0.0065
##       Reclamo_CASA E URBANISTICA  Reclamo_INNOVAZIONE E SMART CITY
##                           0.0006                           -0.0111
## s.e.                      0.0077                            0.0138
##       Reclamo_SCUOLA  Reclamo_SOCIALE  Segnalazione_CASA E URBANISTICA
##              -0.0045           0.0208                            8e-04
## s.e.          0.0196           0.0099                            5e-03
##       Segnalazione_CULTURA  Segnalazione_SOCIALE  Reclamo_AMBIENTE
##                     0.0004               -0.0103            0.0004
## s.e.                0.0057                0.0216            0.0051
##       Segnalazione_COMMERCIO E IMPRESA  Segnalazione_SPORT  Segnalazione_SCUOLA
##                                 0.0075             -0.0070               0.0097
## s.e.                            0.0069              0.0337               0.0017
##       Segnalazione_DIRITTI E PARI OPPORTUNITA  Reclamo_CULTURA
##                                        0.0141           0.0264
## s.e.                                   0.0050           0.0109
##       Reclamo_COMMERCIO E IMPRESA  Reclamo_DIRITTI E PARI OPPORTUNITA
##                            0.0353                               0.004
## s.e.                       0.0235                               0.008
##       Reclamo_SPORT  Segnalazione_ALTRI SERVIZI  Reclamo_ALTRI SERVIZI
##             -0.0076                      0.0098                 0.0200
## s.e.         0.0448                      0.0239                 0.0564
##       Segnalazione_GESTIONE DELL'ENTE  Reclamo_GESTIONE DELL'ENTE
##                                0.0015                      0.0236
## s.e.                           0.0259                      0.0370
##       Reclamo_PATRIMONIO  Reclamo_TURISMO  Segnalazione_SANITA E SALUTE
##                  -0.0287          -0.0283                       -0.0624
## s.e.              0.0337           0.0835                        0.0607
## 
## sigma^2 = 0.02276:  log likelihood = 198.61
## AIC=-297.21   AICc=-280.97   BIC=-102.22
# Forecast  DHRregression2-------------------------------------

pred2<-fc(Frequency = c(24, 168), h=365, XREG_test = as.matrix(wide_data_NODATE),
          Fit=res2, Data=ts_univariate$Numero_Segnalazioni)

# Forecast with DHR Regression
plot(pred2)

This second model is coherent but depends on the matrix. It is highly probable that some sections are too sparse to yield better performance. We should consider either merging some sections if it makes logical sense, removing them, or applying feature engineering.

Following the Professor’s suggestion, we tried a very simple ARIMA model, which is essentially a previous version of DHR.

ARIMA Simple Model

# Model with auto.arima
res <- auto.arima(ts_univariate$Numero_Segnalazioni, xreg = as.matrix(wide_data_NODATE), seasonal = TRUE)

# Forecasts ----------------------
suppressMessages({
forecast_res <- forecast(res, xreg = as.matrix(wide_data_NODATE))
})

# Display the results
forecast_res$model
## Series: ts_univariate$Numero_Segnalazioni 
## Regression with ARIMA(0,0,0) errors 
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): Si è prodotto un NaN
##       intercept  AMA LineaVerde_GESTIONE RIFIUTI E PULIZIA URBANA
##               0                                                 1
## s.e.        NaN                                               NaN
##       Reclamo_OPERE E MANUTENZIONE DELLA CITTA  Reclamo_SERVIZI MUNICIPALI
##                                              1                           1
## s.e.                                       NaN                         NaN
##       Reclamo_SICUREZZA URBANA  Segnalazione_ANAGRAFE E SERVIZI CIVICI
##                          1e+00                                   1e+00
## s.e.                     2e-04                                   1e-04
##       Segnalazione_MOBILITA E TRASPORTI
##                                       1
## s.e.                                NaN
##       Segnalazione_OPERE E MANUTENZIONE DELLA CITTA
##                                                   1
## s.e.                                            NaN
##       Segnalazione_SERVIZI MUNICIPALI  Segnalazione_SICUREZZA URBANA
##                                     1                              1
## s.e.                              NaN                            NaN
##       Segnalazione_TURISMO  Reclamo_GESTIONE RIFIUTI E PULIZIA URBANA
##                          1                                          1
## s.e.                   NaN                                        NaN
##       Reclamo_MOBILITA E TRASPORTI  Reclamo_TRIBUTI E CONTRAVVENZIONI
##                                  1                                  1
## s.e.                           NaN                                NaN
##       Segnalazione_AMBIENTE  Segnalazione_GESTIONE RIFIUTI E PULIZIA URBANA
##                           1                                               1
## s.e.                    NaN                                             NaN
##       Segnalazione_INNOVAZIONE E SMART CITY  Segnalazione_PATRIMONIO
##                                           1                        1
## s.e.                                    NaN                      NaN
##       Segnalazione_TRIBUTI E CONTRAVVENZIONI  Reclamo_ANAGRAFE E SERVIZI CIVICI
##                                            1                                  1
## s.e.                                     NaN                                NaN
##       Reclamo_CASA E URBANISTICA  Reclamo_INNOVAZIONE E SMART CITY
##                                1                                 1
## s.e.                         NaN                               NaN
##       Reclamo_SCUOLA  Reclamo_SOCIALE  Segnalazione_CASA E URBANISTICA
##                    1            1e+00                                1
## s.e.             NaN            1e-04                              NaN
##       Segnalazione_CULTURA  Segnalazione_SOCIALE  Reclamo_AMBIENTE
##                      1e+00                     1                 1
## s.e.                 6e-04                   NaN               NaN
##       Segnalazione_COMMERCIO E IMPRESA  Segnalazione_SPORT  Segnalazione_SCUOLA
##                                      1                   1                    1
## s.e.                               NaN                   0                  NaN
##       Segnalazione_DIRITTI E PARI OPPORTUNITA  Reclamo_CULTURA
##                                             1            1e+00
## s.e.                                      NaN            1e-04
##       Reclamo_COMMERCIO E IMPRESA  Reclamo_DIRITTI E PARI OPPORTUNITA
##                                 1                                   1
## s.e.                          NaN                                 NaN
##       Reclamo_SPORT  Segnalazione_ALTRI SERVIZI  Reclamo_ALTRI SERVIZI
##                   1                           1                  1e+00
## s.e.            NaN                         NaN                  1e-04
##       Segnalazione_GESTIONE DELL'ENTE  Reclamo_GESTIONE DELL'ENTE
##                                     1                           1
## s.e.                              NaN                         NaN
##       Reclamo_PATRIMONIO  Reclamo_TURISMO  Segnalazione_SANITA E SALUTE
##                        1                1                             1
## s.e.                   0                0                             0
## 
## sigma^2 = 1.088e-25:  log likelihood = 9995.07
## AIC=-19902.14   AICc=-19889.77   BIC=-19730.55
# Plot the forecast
plot(forecast_res)

# Forecast with DHR Regression

# Forecast with DHR Regression
# Define the forecast function `fc` before using it (assumed to be pre-defined)

Type of results - CI - This means that for the period 2024.1 (first future period): - The point forecast is 100.23. - There is an 80% probability that the actual value falls between 90.12 and 110.34. - There is a 95% probability that the actual value falls between 85.67 and 115.89.

An ARIMA(0,0,0) model indicates that the model lacks autoregressive, moving average, or differencing components. Essentially, it’s a constant mean model that doesn’t account for any temporal structure in the data. This can occur if the data doesn’t exhibit clear temporal patterns or if the model hasn’t been selected correctly.

If your data doesn’t exhibit obvious temporal patterns, you may need to consider alternative forecasting methods besides ARIMA, such as regression models, machine learning models, or other statistical methods.

Moreover, we divided the work, with some of us searching for relevant information on DHR or Kalman filter, delving into the papers cited before. Especially, the Frequency parameter will likely be important, and in general, we’ll need to delve into the way of setting seasonality also with respect to the actual data we have.

Several more trials can be conducted, especially addressing the following issues:

Alternative Methods

2. Difficulties

As suggested by your comments, the difficulties we derived from the data mainly concern the two datasets from before. The original dataset could be used to derive other time series. In particular, it could be interesting to consider not the overall cumulative complaints, but according to Municipality and Thematic Area, as for logistics, these patterns are more relevant.

In this case, we’re considering standard seasonality by setting the Frequency parameter as default to one day and a week. This could be argued and compared to other models. Some techniques to assess our prior subjective belief of seasonality should be highlighted and deepened.

We expect it would not be that easy to interpret the results of the model and to integrate all the information into one single method. We should consider reducing the number of variables or some more hierarchical models like trees or simulation techniques.

Furthermore, we note that seasonality is interesting here. Not having a previous case study, we should make some predictions based on the call center. If there are some short-term and long-term seasonality, we could model this appropriately. Also, having a few years, we focus on short-term seasonality.


References

Portal Open Data

You can find on Moodle the list of main articles in .bib file.

  • Breiman, L., Friedman , J. H., OlLshen , R. A. and Stone , C.B. (1984). Classification and Regression Trees. Wadsworth, Belmont

  • Ta-Hsin Li and Melvin J. Hinich (2002). A Filter Bank Approach for Modeling and Forecasting Seasonal Patterns. Technometrics, 44, pp. 1-14

  • Tom Vercauteren, Pradeep Aggarwal, Xiaodong Wang (2012) Tree models for difference and change detection in a complex environment. The Annals of Applied Statistics, 6, pp.1286-1297

  • Vercauteren Tom and Aggarwal Pradeep and Wang Xiaodong and Li Ta-Hsin (2007). Hierarchical Forecasting of Web Server Workload Using Sequential Monte Carlo Training. IEEE Transactions on Signal Processing, 55, pp. 1162-1184.

  • Rausch, T. M., Albrecht, T., & Baier, D. (2022). Beyond the beaten paths of forecasting call center arrivals: On the use of dynamic harmonic regression with predictor variables. Journal of Business Economics, 92, 675–706


Project Timeline


Here is a sketch of our updated timeline: